Project: Investigate a Dataset - [Medical Appointment No Shows]

Table of Contents

Introduction

Dataset Description

This dataset includes information about over 100,000 medical appointments of different patients from different neighborhoods in Brazil, and this dataset discuss very important point that why a person makes a doctor appointment, receives all the instructions and no-show. so I will ask questions and answer it to reach the solution for this problem.

Note: The columns that have (0 , 1) values (1 means True) (0 means False).

Question(s) for Analysis

Data Wrangling

Data Cleaning

Columns'S type that need to be changed:

1- PatienID is float64 and it needs to convert to str because I dont want when I caluclate or use describe() fun to consider it as a numeric data

2- AppointmentID is int and I will convert it to str that is because the same reason of patientid

3- ScheduledDay and AppointmentDay need to convert to a datatime because I will use them to extract a month and a day form it.

Renaming some columns

Changing Show values:

First No-show column before change it to show means 'No' if the patient showed up to their appointment, and 'Yes' if they did not show up.

So after I changed the column to show I am going to change 'No' to 1 and 'Yes' to 0 to avoid any misunderstanding or misconception and to be like other columns (1 means True) and (0 means Flase).

So all the dataset now (1 = True) and (0 = False).

Changing the gender column values:

from F and M to Female and Male

Check for missing values

there is not any missing value

Check for duplicate rows

There is not any duplicated value also

Exploratory Data Analysis

First: Lets see overall vis between show column and other features:

Is there a correlation between the dataset ?

After the two steps here what I figured out:

Hypertension and Diabetes have moderate postive correlation(0.43).

Hypertension and age have strong postive correlation(0.5).

Scholarship and show have negative correlation(-0.02). This means don't have scholarship will increase the possibility of showing up.

Alcoholism and show don't have any relationship(0.0002). This means it wouldn't affect showing up of the patient.

Sms_received and show have strong negative correlation(-0.13). This means didn't receive sms will increase the possibility of showing up.

What is the ratio of Female to Male ?

Does the gender affect showing up of the patient ?

Lets first take a peek in the gender based on the count of show up or no show.

This is just an overall count of the gender according to showing up in the appoinment day or not. but to know the answer if the gender affect showing up or not I need to calculate the precentage of both of them.

The gender does not affect showing up of the patient, because the female and male almost equal whether showing up or not.

Does the gender with scholarship affect showing up of the patient ?

Yes, the gender with scholarship affect showing up of the patient. Becuase there is a negative correlation between scholarship and showing up. Here when the people whether male or female dont have scholarship the possibility of thier appearance or showing up increase.

Does the gender with hypertension affect showing up of the patient ?

Yes, the gender with hypertension affect showing up of the patient. Beacause there is a relationship or a positive correlation bewteen having hypertension and showing up. Like here I see both gender who have hypertension showing up in the appointment day more than the other who dont have hypertension.

Does the gender with diabetes affect showing up of the patient ?

Yes, the gender with diabetes affect showing up of the patient. Beacause there is a relationship or a positive correlation bewteen having diabetes and showing up. Like here I see both gender who have diabetes showing up in the appointment day more than the other who dont have diabetes. Same as hypertension..

Does the gender with alcoholism affect showing up of the patient ?

The alcoholism doesn't affect the showing up of the patient , and there isn't a strong relationship between alcoholism and showing up of the patient because:

Male: the ones who drink show up in the appointment day more than the ones who don't drink.

Female: is quite the opposit of male, the ones who don't drink show up in the appointment day more than the ones who actually drink.

Does the gender with handicap affect showing up of the patient ?

Here I have a problem to visulaize it with other variables. 0 means not handicapped, [1,2,3,4] means the person is handicapped. So I am going to convert [1,2,3,4] to 1.

Yes, the gender with handicap affect showing up of the patient. Beacause there is a positive correlation bewteen being handicapped and showing up. Like here I see both gender who is handicapped showing up in the appointment day more than the other who isn't handicapped.

Does receiving sms affect showing up in the appointment based on the gender ?

No, there isn't any relationship between receiving sms and showing up in the appointment based on the gender. Beacause both gender who didn't receive sms showing up in the appointment more than the ones who received sms. So receiving sms or not has nothing to do with showing up It isn't the problem for not showing up.

What is the distribution of the age ?

some analysis in the age column:

min: -1 it is just one row, and it doesn't make sense so i dropped it.

max: 115 its weird but it can happen so I will leave it the same.

mode: is zero its (3539) values maybe zero means babies didn't birth yet, but it doesn't make any sense so I am going to convert it to nan.

There are outliers at the age of 115 five people.

Does the age of the paitent affect showing up in the appointment day?

Yes, the age of the patient affect showing up in the appointment day. So there is a postive correlation between the age and showing up . Here when the people getting older the possibility of showing up in the appointment day increase .

which days the people choose most for the appointments ?

My doubts are true there are five rows that the appointment day is before the scheduled day and this doesn't make any sense so I will drop them.

The days of the appointment that is full from monday to friday.

The most day the people chose to go to the appointment is wednesday.

Sunday has no appointments.

saturday is the least day of appointments.

which day of the week has the most precentage of showing up ?

Thursday has the most precentage of showing up and Wednesday the second day of showing up.

which day of the week has the most precentage of no show up ?

Saturday is the most day people didn't show up in the appointment day and then friday.

What is the ratio of the month according to showing up and don't show up ?

The ratio of three month according to show up almost the same, and we cant say the most month people show up in because there is a huge difference between the three months in values. May is 64037, June is 21568, and April is 2602.

we cant say the highest and the lowest ratio because there is a huge difference between the values of the three months. may is 16799, June is 4882 and April is 633.

Does waiting days affect showing up of the patient ?

The waiting days doesn't affect showing up of the patient.

Conclusions

Limitations